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Abstract 

We present c?PDE, a new multivariate analysis technique for parameter estimation. 
The method is based on a direct construction of joint probability densities of known 
variables and the parameters to be estimated. We show how posterior densities and 
best-value estimates are then obtained for the parameters of interest by a straight- 
forward manipulation of these densities. The method is essentially non-parametric 
and allows for an intuitive graphical interpretation. We illustrate the method by 
outlining how it can be used to estimate the mass of the top quark, and we explain 
how the method is applied to an ensemble of events containing background. 
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1 Introduction 



In an earlier paper |IJ we introduced the PDE (Probability Density Estima- 
tion) method, an essentially non-parametric and multivariate method designed 
for identifying small signals among large backgrounds. The method makes use 
of kernel density estimates for signal and background probability densities, 
and a simple discriminant function is then used to classify candidate events. 
The PDE method was applied successfully to the search for the top quark at 
the Fermilab Tevatron, and it is an integral part of a general search strategy 0] 
for analyzing data from high-energy physics experiments. 

In this paper we present <xPDE, an extension of the PDE method designed 
for parameter estimation, where a represents a vector of parameters to be 
estimated. In many applications a is a single parameter, such as the mass of 
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an unstable particle detected through its decay products. This non-parametric 
and multivariate method may be particularly applicable to problems such as 
determining the mass of the top quark in the upcoming collider run (Run II) 
of the Fermilab Tevatron. 

Multivariate methods are now widely recognized as being more powerful than 
univariate methods, and a non-parametric method has the advantage that one 
need not make assumptions about the forms of probability distributions. Those 
who feel uneasy about the "black-box" quality of neural networks should wel- 
come the straightforward manipulation of probability densities used in this 
method, and the intuitive graphical interpretation that results. Because prob- 
ability densities are constructed and manipulated directly, obtaining any ad- 
ditional statistical information — Bayesian credible intervals, for example - 
is a straightforward exercise. 

A typical parameter estimation problem is described in Sec. 2; our recipe for 
solving it is provided in Sec. 3. The salient features of the method and its 
potential advantages are summarized in Sec. 4. 



2 The problem 

The next decade of high energy collider physics will emphasize measurements 
and searches for new phenomena at the scale of several hundred GeV. The 
existence of a new particle at this scale can be convincingly demonstrated by 
observing a peak in an invariant mass distribution, but the signature may 
be such that more indirect methods of establishing the particle's existence, 
and subsequently measuring parameters such as its mass and couplings, are 
required. We introduce 5PDE with an example of this nature: the determi- 
nation of the top quark mass. Top quarks are pair-produced at the Fermilab 
Tevatron, each decaying promptly to a If boson and a b quark. Each W 
boson in turn decays either to a charged lepton and a neutrino, or to two 
quarks. Quarks hadronize, appearing in the detector as collimated flows of 
energy (jets). The characteristic experimental signature for a top quark event 
is therefore a final state containing either an energetic lepton, missing trans- 
verse energy, and several energetic jets, or a final state containing two energetic 
leptons, missing transverse energy, and a pair of jets; decays to six jets are 
difficult to distinguish from events in which no top quark was produced. The 
application of selection criteria favoring events with jets originating from b 
quarks enhances the fraction of top quark events in the sample. 

For the sake of simplicity we assume that two variables x = (x, y) have been 
identified for this analysis. This pair might be the transverse energies of the 
lepton and the leading jet; it might be the invariant mass of the sub-leading 
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jets and the transverse momentum of the W boson; it might be the scalar 
sum of all jet transverse energies and the output of a neural network built 
with event-shape variables. No special assumptions about the nature of these 
variables need be made. 



3 The Recipe 

The goal is to construct a method that performs as well as (or better than) such 
popular algorithms as neural networks, but to keep the method sufficiently 
simple that it reads like a recipe. The recipe follows. 

3. 1 Specify p{m) 

This method has its roots in Bayesian statistics, and as a result it has the 
advantage (disadvantage) of enabling (requiring) the specification of a function 
p{m\X) that encodes prior beliefs about the value of the top quark mass m. 
X here is used in standard Bayesian notation to represent all assumptions 
implicit in our specification of this prior probability. The basic assumptions 
contained in X will not change, so we drop it from here on, writing simply 
p(m). A natural choice for p(m), used when there is strong belief that the true 
mass must lie somewhere between a and b but no reason to prefer any value 
within that range over any other, is the flat prior: p{m) = ^ for a < m < b, 
and elsewhere. 



3.2 Generate Monte Carlo events 

Monte Carlo events are generated with top quark masses m pulled from the 
distribution p(m) specified above. That is, the probability that an event with 
a top quark mass between m and m + 5m is generated is p(m) 5m. For each 
Monte Carlo event we calculate the two variables x = (x,y). 

A histogram in (x, y, m) filled with the generated events approximates the 
joint density p(x,y,m). This function has the property that, given an event 
in which a top quark is produced and decays to the observed final state, the 
probability that the top quark mass was between m and m + 5m, the first 
variable between x and x + 5x, and the second variable between y and y + 5y, 
is simply p(x, y, m) 5x 5y 5m. 
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3. 3 Construct a training array T 



Each of the iV Monte Carlo events just generated is characterized by three 
numbers: the value of x, the value of y, and the top quark mass to. The Monte 
Carlo events are labeled with the index % {% = 1,..,N); the three numbers 
corresponding to the i th event are then Xi, yi, and mi. Define the event vector 
Vi for the i th Monte Carlo event by 

Vi = (xi,mi), (1) 



and define the training array T for the entire set of Monte Carlo events by 
Tij = (Vi)j. (2) 

Here and below i ranges from 1 to N and indexes the Monte Carlo events; j 
ranges from 1 to 3 and indexes the components of the event vector v. 

3.4 Calculate the covariance matrix 

Having defined the event vector v, calculate the mean event vector 

<tf> = iE3 ( 3 ) 

and construct the training covariance matrix 

*» = ^j:m*-® k m)i-<fti) (4) 
i=i 

in the standard way S is a 3 by 3 symmetric matrix, with £12 = Cov(x,y), 
£13 = Cov(x, to), and so on. 

3.5 Estimate the joint density p(v) 



In Sec. 3.2 we imagined filling a three-dimensional histogram in v with Monte 
Carlo events, and we recognized that the resulting histogram represents an 
estimation of a probability density. A well-known technique in multivariate 
statistics involves estimating a probability density not by filling a histogram, 
but rather by summing kernels of probability placed around each point. Due 
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to its familiarity and smoothness properties, a favorite kernel choice is the 
multivariate gaussian: 

K(v) = 1 - exp ( "^f" 1 ^ • (5) 



The vector v is the same three-component vector defined above, and £ _1 is the 
inverse of the training array covariance matrix E. The parameter h is known 
in the language of density estimation as a smoothing parameter, it controls the 
width of the kernels placed around each point. Theoretical arguments suggest 
an optimal choice of h « JV -1 '^ +4 ^ as a function of the number of data points 
N and the dimensionality d of the variable spaceJJ 

An estimate of the joint probability density p{v) is then obtained simply by 
summing kernels centered about each of the N data points Vi, so that 

1 N 

v{v) = -Y,K{v-v l ). (6) 

iV i=i 



3.6 Compute the posterior density p{m\x) 



A physicist attempting to measure the top quark mass is interested in the 
posterior density p(m\x) for m. In words, p(m\x) is the probability that the 
top quark mass is m given that we have observed an event with variable values 
x. This posterior density is easily obtained. The probability of obtaining both 
x and m is equal to the probability of obtaining x multiplied by the probability 
of obtaining m given that you have obtained x: 

p(x,m) — p(x)p(m\x), (7) 



and the probability of obtaining x is given by integrating the probability of 
obtaining both x and m over all values of m: 

p(x) = p(x,m') dm'. (8) 



This expression for h depends on assumptions about the probability density that 
we have not made explicit, and is not exact 0, ffl. In practice, h may be optimized 
for any set of Monte Carlo events by constructing and minimizing some appropriate 
error estimate x(h). For N = 10 5 and d = 3, the optimal choice for h is roughly 
0.20. 
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Thus the posterior density p{m\x) is related to the joint density p(x, m) simply 



3.7 Compute m (optional) 

In the Bayesian view, the posterior density is the natural result of this recipe. 
Nonetheless, it is often convenient to reduce the posterior density p(m\x) to a 
single number m representing the best estimate of the parameter in question. 
Among the natural choices for the best estimate are the mean, median, and 
mode of the posterior distribution. Adopting the last for the purposes of this 
discussion, we solve the equation 

p(m\x) = m&xp(m\x) (10) 

numerically for m. Since the denominator of Eq. 9 is independent of m, max- 
imizing the posterior density p{m\x) is equivalent to maximizing the joint 
density p(x,m), which we have constructed explicitly. The extent to which 
the posterior density p(m\x) peaks around the value rh depends, of course, on 
how strongly the variables x correlate with the true mass m. 

We note that this method can be modified to produce results that obey the 
frequentist notion of coverage. Assume that a 68% confidence region is de- 
sired. Starting with p(x\m), draw for each fixed m the contour C m in x-space 
enclosing 68% of the density and minimal area. Then upon observing x in the 
data, the 68% confidence region for m is the union of all values of m for which 
x lies inside C m . 



Extension to the case of an ensemble of data events is treated in Appendix C. 



4 Conclusions 

The analysis method described here is quite general, and can be used in the 
context of any parameter estimation problem. The non-parametric approach 
used to estimate probability densities is helpful when the distributions under 
consideration do not lend themselves to an obvious parameterization. c?PDE 
allows the use of several measured variables, and enables the simultaneous 
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p(m\x) 




(9) 



/ p(x, m') dm' 
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estimation of several parameters. The generalization to arbitrary dimension is 
provided in Appendix A. Bayesian credible intervals and moments are easily 
obtained from simple manipulations of the joint probability density. 



A The general multivariate case 

For pedagogical reasons, aPDE has been introduced through a specific exam- 
ple — determining the mass m of the top quark from two measured quantities 
x and y — and the expressions in the text are therefore specific to that ex- 
ample. In this appendix we provide the formulae for the general case. 

In the general case, let each event be characterized by d\ known variables x 
and d 2 unknown parameters a. Let d = di + d 2 , and let the ci-dimensional 
event vector be v = (x, a). 

The i th Monte Carlo event is now described by the event vector 

v i = (x i ,a i ), (A.l) 

and the entire Monte Carlo sample is described by the training array 

T tJ = (A.2) 

where j now ranges from 1 to d. The mean event vector is 

<*> = -SfE* ( A -3) 

JV i=i 

and the training covariance matrix is 

1 N 

JV 1=1 

as before, and the general multivariate gaussian is given by 
Finally, in Eqs. 9 and 10, m should be replaced by the vector a. 
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Fig. B.l. A sample function £(x,m) that might be constructed from Monte Carlo 
events at masses m = 10, 20, 30, 40, and 50. Notice the ridges in this function, due 
to the fact that it is constructed from events at specific masses. 

In practice, limited computing resources place an upper bound on N, and 
hence an upper bound on d. The optimal accuracy of the kernel estimate is 
of order N~ s ^ 2s+d \ where s is a positive integer that reflects the assumed 
smoothness of the unknown density, and a typical assumption of continuous 
and square integrable derivatives up to second order corresponds to s — 2 ||, fj|. 



B Alternative to generating a random sample of Monte Carlo 
events 

In this appendix we describe a modification to the procedure described in the 
text if practical constraints prevent the generation of events pulled from a 
continuous prior p(m), but allow the generation of events at q discrete values 
rrij, where j = 1, .., q. 

Two changes are required in the first five steps of the recipe (Sees. 3.1-3.5). 
First, it is assumed that practical constraints require Monte Carlo events to 
be generated at the discrete masses rrij, rather than as described in Sec. 3.2. 
Second, the function calculated in Eq. 6, which may no longer be interpreted 
as a joint density, should be re-labeled. For lack of a better alternative, call it 

We now add a step 5^ between Sees. 3.5 and 3.6. The function £(■??) is clearly 
not an appropriate density. If events have been generated assuming five dif- 
ferent masses rrij, a graph of £(v) might appear as shown in Fig. B.l. We see 
that the density has ridges along the values of m for which events have been 
generated, with corresponding valleys in the regions between these values. 

An appropriately rescaled probability density p(x, m) can be generated by 
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Fig. B.2. The density p(x,m) formed by rescaling the function £(x,m) shown in 
Fig. B.l. Notice how this rescaling corrects for the fact that only events at specific 
masses were used in the construction of £(x, m). 

multiplying £(v) by a normalizing m-dependent factor s(m): 

p(x,m) = £(x,m)s(m). (B.l) 



This normalizing factor will correct for the fact that valleys have been intro- 
duced into the density by only generating events at specific masses rrij. The 
requirement that 

/ p(x, m) dx = p(m) (B.2) 



determines this normalizing factor uniquely. The desired joint probability den- 
sity p(v) is then given by 

P« = «y v (B.3) 
J dx' £{x , m) 

and the final step (Sec. 3.6) is exactly as before. The rescaled density of Fig. B.l 
is shown in Fig. B.2. 

We mention briefly a useful shortcut when calculating integrals such as that 
appearing in the denominator of Eq. B.3. Multidimensional integrals are dif- 
ficult to calculate in general, but this integral can be handled analytically 
provided one uses gaussian kernels. Assume as in Appendix A that the vector 
of known variables x is of d\ dimensions, that the vector of unknown variables 
a is of g?2 dimensions, and that the Monte Carlo has a covariance matrix E. 
Then the relevant formula is 

^ 1 ( — a T £ /-1 cA , N 

K[x,a)dx=—— exp — , (B.4) 
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where S' is the d.2 by g?2 sub-matrix of E formed by retaining elements with 
row and column numbers larger than d\. 



C Background events 

In the text we considered the problem of determining the top quark mass m 
for one candidate event. In a real analysis there will be n such events, and of 
those some fraction b are expected to be background events — events that do 
not contain a top quark at all. This appendix shows how to apply aPDE to a 
complete analysis. 

Signal and background Monte Carlo events are generated and used to con- 
struct the signal density p s (x, a), as described in Sees. 3.1-3.5, and the back- 
ground density Pb{x), which is independent of a. From a careful analysis of 
background efficiencies we determine the probability p(b) that a fraction b of 
our events are background events. In previous sections of this article Xi re- 
ferred to Monte Carlo events; in this section we change notation and label the 
n observed data events by x±, .., x n . 

The goal is to compute the posterior density p(a\x\, ..,x n ). Since the observa- 
tions xi,..,x n are assumed to be independent, p(x±, .., x n \a, b) factors into a 
product: 

n 

p(x!, ..,x n \a,b) = Y[p(xi\a, b). (C.l) 

i=i 

The probability p(xi\a, b) for the i th data event can be written in terms of the 
signal and background probability densities as 

p(xi\a, b) = (1 - b) p s (xi\a) + bp b (xi), (C.2) 

where p(x\a) = p(x,a)/p(a). Integrating out the nuisance parameter b in 
Eq. C.l leaves 

p(a|fi, ..,£„) = Np{a) j lf[[(l-b)p s (xi\a)+bp b (xi)]\p(b)db, (C.3) 

where Af is a normalization factor ensuring that / p(a\xi, ..,x n ) da = 1, and 
p(a, b) = p(a)p(b) is assumed. 

The most likely values of the parameters a are then those for which p(a\xi, ..,x n ) 
achieves its maximum, and the uncertainty on these values can be estimated 
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from the width of the peak. Other frequently-used best estimates and their 
errors are easily computed, if desired, from straightforward manipulation of 
the posterior density p(a\xi, ..,x n ). 
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